R Bootcamp, Module 6: Data manipulation with the tidyverse

August 2019, UC Berkeley

Chris Paciorek (based on materials developed by Kellie Ottoboni, Rochelle Terman, Nima Hejazi, and Chris Krogslund)

Overview

It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data. (Dasu and Johnson, 2003)

Thus before you can even get to doing any sort of sophisticated analysis or plotting, you'll generally first need to:

  1. Manipulating data frames, e.g., filtering, summarizing, and conducting calculations across groups.
  2. Tidying data into the appropriate format

There are two competing schools of thought within the R community.

We'll show you some of the tidyverse tools so you can make an informed decision about whether you want to use base R or these newfangled packages.

Data frame Manipulation using Base R Functions

So far, you've seen the basics of manipulating data frames, e.g. subsetting, merging, and basic calculations. For instance, we can use base R functions to calculate summary statistics across groups of observations, e.g., the mean GDP per capita within each region:

mean(gap[gap$continent == "Africa", "gdpPercap"])
## [1] 2193.755
mean(gap[gap$continent == "Americas", "gdpPercap"])
## [1] 7136.11
mean(gap[gap$continent == "Asia", "gdpPercap"])
## [1] 7902.15

But this isn't ideal because it involves a fair bit of repetition. Repeating yourself will cost you time, both now and later, and potentially introduce some nasty bugs.

Data frame Manipulation using dplyr

Luckily, the dplyr package provides a number of very useful functions for manipulating data frames. These functions will save you time by reducing repetition. As an added bonus, you might even find the dplyr grammar easier to read.

Here we're going to cover 6 of the most commonly used functions as well as using pipes (%>%) to combine them.

  1. select()
  2. filter()
  3. group_by()
  4. summarize()
  5. mutate()
  6. arrange()

If you have have not installed this package earlier, please do so now:

# NOT run
install.packages('dplyr')

Now let's load the package:

library(dplyr)

dplyr::select

Imagine that we just received the gapminder dataset, but are only interested in a few variables in it. We could use the select() function to keep only the columns corresponding to variables we select.

year_country_gdp_dplyr <- select(gap, year, country, gdpPercap)
head(year_country_gdp_dplyr)
##   year     country gdpPercap
## 1 1952 Afghanistan  779.4453
## 2 1957 Afghanistan  820.8530
## 3 1962 Afghanistan  853.1007
## 4 1967 Afghanistan  836.1971
## 5 1972 Afghanistan  739.9811
## 6 1977 Afghanistan  786.1134

If we open up year_country_gdp, we'll see that it only contains the year, country and gdpPercap. This is equivalent to the base R subsetting function:

year_country_gdp_base <- gap[,c("year", "country", "gdpPercap")]
head(year_country_gdp_base)
##   year     country gdpPercap
## 1 1952 Afghanistan  779.4453
## 2 1957 Afghanistan  820.8530
## 3 1962 Afghanistan  853.1007
## 4 1967 Afghanistan  836.1971
## 5 1972 Afghanistan  739.9811
## 6 1977 Afghanistan  786.1134

We can even check that these two data frames are equivalent:

# checking equivalence: TRUE indicates an exact match between these objects
all.equal(year_country_gdp_dplyr, year_country_gdp_base)
## [1] TRUE

But, as we will see, dplyr makes for much more readable, efficient code because of its pipe operator.

piping with dplyr

Above, we used what's called "normal" grammar, but the strengths of dplyr lie in combining several functions using pipes. Pipes take the input on the left side of the %>% symbol and pass it in as the first argument to the function on the right side. Since the pipe grammar is unlike anything we've seen in R before, let's repeat what we've done above using pipes.

year_country_gdp <- gap %>% select(year, country, gdpPercap)

First we summon the gapminder dataframe and pass it on to the next step using the pipe symbol %>% The second steps is the select() function. In this case we don't specify which data object we use in the call to select() since we've piped it in.

Fun Fact: There is a good chance you have encountered pipes before in the shell. In R, a pipe symbol is %>% while in the shell it is |. But the concept is the same!

dplyr::filter

Now let's say we're only interested in African countries. We can combine select and filter to select only the observations where continent is Africa.

year_country_gdp_africa <- gap %>%
    filter(continent == "Africa") %>%
    select(year,country,gdpPercap)

As with last time, first we pass the gapminder data frame to the filter() function, then we pass the filtered version of the gapminder data frame to the select() function.

To clarify, both the select and filter functions subsets the data frame. The difference is that select extracts certain columns, while filter extracts certain rows.

Note: The order of operations is very important in this case. If we used 'select' first, filter would not be able to find the variable continent since we would have removed it in the previous step.

dplyr Calculations Across Groups

A common task you'll encounter when working with data is running calculations on different groups within the data. For instance, what if we wanted to calculate the mean GDP per capita for each continent?

In base R, you would have to run the mean() function for each subset of data:

mean(gap$gdpPercap[gap$continent == "Africa"])
## [1] 2193.755
mean(gap$gdpPercap[gap$continent == "Americas"])
## [1] 7136.11
mean(gap$gdpPercap[gap$continent == "Asia"])
## [1] 7902.15
mean(gap$gdpPercap[gap$continent == "Europe"])
## [1] 14469.48
mean(gap$gdpPercap[gap$continent == "Oceania"])
## [1] 18621.61

That's a lot of repetition! To make matters worse, what if we wanted to add these values to our original data frame as a new column? We would have to write something like this:

gap$mean.continent.GDP <- NA

gap$mean.continent.GDP[gap$continent == "Africa"] <- mean(gap$gdpPercap[gap$continent == "Africa"])

gap$mean.continent.GDP[gap$continent == "Americas"] <- mean(gap$gdpPercap[gap$continent == "Americas"])

gap$mean.continent.GDP[gap$continent == "Asia"] <- mean(gap$gdpPercap[gap$continent == "Asia"])

gap$mean.continent.GDP[gap$continent == "Europe"] <- mean(gap$gdpPercap[gap$continent == "Europe"])

gap$mean.continent.GDP[gap$continent == "Oceania"] <- mean(gap$gdpPercap[gap$continent == "Oceania"])

You can see how this can get pretty tedious, especially if we want to calculate more complicated or refined statistics. We could use loops or apply functions, but these can be difficult, slow, or error-prone.

dplyr split-apply-combine

The abstract problem we're encountering here is know as "split-apply-combine":

We want to split our data into groups (in this case continents), apply some calculations on each group, then combine the results together afterwards.

Module 4 gave some ways to do split-apply-combine type stuff using the apply family of functions, but those are error prone and messy.

Luckily, dplyr offers a much cleaner, straight-forward solution to this problem.

# remove this column -- there are two easy ways!
gap <- gap %>% select(-mean.continent.GDP)
# OR
gap$mean.continent.GDP <- NULL

dplyr::group_by

We've already seen how filter() can help us select observations that meet certain criteria (in the above: continent == "Europe"). More helpful, however, is the group_by() function, which will essentially use every unique criteria that we could have used in filter().

A grouped_df can be thought of as a list where each item in the list is a data.frame which contains only the rows that correspond to the a particular value continent (at least in the example above).

dplyr::summarize

group_by() on its own is not particularly interesting. It's much more exciting used in conjunction with the summarize() function. This will allow use to create new variable(s) by applying transformations to variables in each of the continent-specific data frames. In other words, using the group_by() function, we split our original data frame into multiple pieces, which we then apply summary functions to (e.g. mean() or sd()) within summarize(). The output is a new data frame reduced in size, with one row per group.

gdp_bycontinents <- gap %>%
    group_by(continent) %>%
    summarize(mean_gdpPercap = mean(gdpPercap))
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
head(gdp_bycontinents)
## # A tibble: 5 x 2
##   continent mean_gdpPercap
##   <chr>              <dbl>
## 1 Africa             2194.
## 2 Americas           7136.
## 3 Asia               7902.
## 4 Europe            14469.
## 5 Oceania           18622.

That allowed us to calculate the mean gdpPercap for each continent. But it gets even better -- the function group_by() allows us to group by multiple variables. Let's group by year and continent.

gdp_bycontinents_byyear <- gap %>%
    group_by(continent, year) %>%
    summarize(mean_gdpPercap = mean(gdpPercap))
head(gdp_bycontinents_byyear)
## # A tibble: 6 x 3
## # Groups:   continent [1]
##   continent  year mean_gdpPercap
##   <chr>     <int>          <dbl>
## 1 Africa     1952          1253.
## 2 Africa     1957          1385.
## 3 Africa     1962          1598.
## 4 Africa     1967          2050.
## 5 Africa     1972          2340.
## 6 Africa     1977          2586.

That is already quite powerful, but it gets even better! You're not limited to defining 1 new variable in summarize().

gdp_pop_bycontinents_byyear <- gap %>%
    group_by(continent, year) %>%
    summarize(mean_gdpPercap = mean(gdpPercap),
              sd_gdpPercap = sd(gdpPercap),
              mean_pop = mean(pop),
              sd_pop = sd(pop))
head(gdp_pop_bycontinents_byyear)
## # A tibble: 6 x 6
## # Groups:   continent [1]
##   continent  year mean_gdpPercap sd_gdpPercap mean_pop    sd_pop
##   <chr>     <int>          <dbl>        <dbl>    <dbl>     <dbl>
## 1 Africa     1952          1253.         983. 4570010.  6317450.
## 2 Africa     1957          1385.        1135. 5093033.  7076042.
## 3 Africa     1962          1598.        1462. 5702247.  7957545.
## 4 Africa     1967          2050.        2848. 6447875.  8985505.
## 5 Africa     1972          2340.        3287. 7305376. 10130833.
## 6 Africa     1977          2586.        4142. 8328097. 11585184.

dplyr::mutate

What if we wanted to add these values to our original data frame instead of creating a new object? For this, we can use the mutate() function, which is similar to summarize() except it creates new variables in the same data frame that you pass into it.

gap_with_extra_vars <- gap %>%
    group_by(continent, year) %>%
    mutate(mean_gdpPercap = mean(gdpPercap),
              sd_gdpPercap = sd(gdpPercap),
              mean_pop = mean(pop),
              sd_pop = sd(pop))
head(gap_with_extra_vars)
## # A tibble: 6 x 10
## # Groups:   continent, year [6]
##   country  year    pop continent lifeExp gdpPercap mean_gdpPercap
##   <chr>   <int>  <dbl> <chr>       <dbl>     <dbl>          <dbl>
## 1 Afghan…  1952 8.43e6 Asia         28.8      779.          5195.
## 2 Afghan…  1957 9.24e6 Asia         30.3      821.          5788.
## 3 Afghan…  1962 1.03e7 Asia         32.0      853.          5729.
## 4 Afghan…  1967 1.15e7 Asia         34.0      836.          5971.
## 5 Afghan…  1972 1.31e7 Asia         36.1      740.          8187.
## 6 Afghan…  1977 1.49e7 Asia         38.4      786.          7791.
## # … with 3 more variables: sd_gdpPercap <dbl>, mean_pop <dbl>,
## #   sd_pop <dbl>

We can use also use mutate() to create new variables prior to (or even after) summarizing information. Note that mutate() does not need to operate on grouped data and it can do element-wise transformations.

gdp_pop_bycontinents_byyear <- gap %>%
    mutate(gdp_billion = gdpPercap*pop/10^9) %>%
    group_by(continent, year) %>%
    summarize(mean_gdpPercap = mean(gdpPercap),
              sd_gdpPercap = sd(gdpPercap),
              mean_pop = mean(pop),
              sd_pop = sd(pop),
              mean_gdp_billion = mean(gdp_billion),
              sd_gdp_billion = sd(gdp_billion))
head(gdp_pop_bycontinents_byyear)
## # A tibble: 6 x 8
## # Groups:   continent [1]
##   continent  year mean_gdpPercap sd_gdpPercap mean_pop sd_pop
##   <chr>     <int>          <dbl>        <dbl>    <dbl>  <dbl>
## 1 Africa     1952          1253.         983. 4570010. 6.32e6
## 2 Africa     1957          1385.        1135. 5093033. 7.08e6
## 3 Africa     1962          1598.        1462. 5702247. 7.96e6
## 4 Africa     1967          2050.        2848. 6447875. 8.99e6
## 5 Africa     1972          2340.        3287. 7305376. 1.01e7
## 6 Africa     1977          2586.        4142. 8328097. 1.16e7
## # … with 2 more variables: mean_gdp_billion <dbl>, sd_gdp_billion <dbl>

mutate vs. summarize

It can be confusing to decide whether to use mutate or summarize. The key distinction is whether you want the output to have one row for each group or one row for each row in the original data frame:

Note that if you use an aggregation function such as mean() within mutate() without using groupby(), you'll simply do the summary over all the rows of the input dataframe.

And if you use an aggregation function such as mean() within summarize() without using groupby(), you'll simply create an output dataframe with one row (i.e., the whole input dataframe is a single group).

dplyr::arrange

As a last step, let's say we want to sort the rows in our data frame according to values in a certain column. We can use the arrange() function to do this. For instance, let's organize our rows by year (recent first), and then by continent.

gap_with_extra_vars <- gap %>%
    group_by(continent, year) %>%
    mutate(mean_gdpPercap = mean(gdpPercap),
              sd_gdpPercap = sd(gdpPercap),
              mean_pop = mean(pop),
              sd_pop = sd(pop)) %>%
    arrange(desc(year), continent)
head(gap_with_extra_vars)
## # A tibble: 6 x 10
## # Groups:   continent, year [1]
##   country  year    pop continent lifeExp gdpPercap mean_gdpPercap
##   <chr>   <int>  <dbl> <chr>       <dbl>     <dbl>          <dbl>
## 1 Algeria  2007 3.33e7 Africa       72.3     6223.          3089.
## 2 Angola   2007 1.24e7 Africa       42.7     4797.          3089.
## 3 Benin    2007 8.08e6 Africa       56.7     1441.          3089.
## 4 Botswa…  2007 1.64e6 Africa       50.7    12570.          3089.
## 5 Burkin…  2007 1.43e7 Africa       52.3     1217.          3089.
## 6 Burundi  2007 8.39e6 Africa       49.6      430.          3089.
## # … with 3 more variables: sd_gdpPercap <dbl>, mean_pop <dbl>,
## #   sd_pop <dbl>

dplyr Take-aways

# without pipes:
gap_with_extra_vars <- arrange(
    mutate(
      group_by(gap, continent, year),
      mean_gdpPercap = mean(gdpPercap)
      ),
    desc(year), continent)

dplyr and "non-standard evaluation"

You may run across the term "non-standard evaluation". The use of dataframe variables without quotes around them is an example of this.

Why is this strange?

gap %>% select(continent, year)  %>% tail()

Compare it to:

gap[ , c('continent', 'year')]
##      continent year
## 1         Asia 1952
## 2         Asia 1957
## 3         Asia 1962
## 4         Asia 1967
## 5         Asia 1972
## 6         Asia 1977
## 7         Asia 1982
## 8         Asia 1987
## 9         Asia 1992
## 10        Asia 1997
## 11        Asia 2002
## 12        Asia 2007
## 13      Europe 1952
## 14      Europe 1957
## 15      Europe 1962
## 16      Europe 1967
## 17      Europe 1972
## 18      Europe 1977
## 19      Europe 1982
## 20      Europe 1987
## 21      Europe 1992
## 22      Europe 1997
## 23      Europe 2002
## 24      Europe 2007
## 25      Africa 1952
## 26      Africa 1957
## 27      Africa 1962
## 28      Africa 1967
## 29      Africa 1972
## 30      Africa 1977
## 31      Africa 1982
## 32      Africa 1987
## 33      Africa 1992
## 34      Africa 1997
## 35      Africa 2002
## 36      Africa 2007
## 37      Africa 1952
## 38      Africa 1957
## 39      Africa 1962
## 40      Africa 1967
## 41      Africa 1972
## 42      Africa 1977
## 43      Africa 1982
## 44      Africa 1987
## 45      Africa 1992
## 46      Africa 1997
## 47      Africa 2002
## 48      Africa 2007
## 49    Americas 1952
## 50    Americas 1957
## 51    Americas 1962
## 52    Americas 1967
## 53    Americas 1972
## 54    Americas 1977
## 55    Americas 1982
## 56    Americas 1987
## 57    Americas 1992
## 58    Americas 1997
## 59    Americas 2002
## 60    Americas 2007
## 61     Oceania 1952
## 62     Oceania 1957
## 63     Oceania 1962
## 64     Oceania 1967
## 65     Oceania 1972
## 66     Oceania 1977
## 67     Oceania 1982
## 68     Oceania 1987
## 69     Oceania 1992
## 70     Oceania 1997
## 71     Oceania 2002
## 72     Oceania 2007
## 73      Europe 1952
## 74      Europe 1957
## 75      Europe 1962
## 76      Europe 1967
## 77      Europe 1972
## 78      Europe 1977
## 79      Europe 1982
## 80      Europe 1987
## 81      Europe 1992
## 82      Europe 1997
## 83      Europe 2002
## 84      Europe 2007
## 85        Asia 1952
## 86        Asia 1957
## 87        Asia 1962
## 88        Asia 1967
## 89        Asia 1972
## 90        Asia 1977
## 91        Asia 1982
## 92        Asia 1987
## 93        Asia 1992
## 94        Asia 1997
## 95        Asia 2002
## 96        Asia 2007
## 97        Asia 1952
## 98        Asia 1957
## 99        Asia 1962
## 100       Asia 1967
## 101       Asia 1972
## 102       Asia 1977
## 103       Asia 1982
## 104       Asia 1987
## 105       Asia 1992
## 106       Asia 1997
## 107       Asia 2002
## 108       Asia 2007
## 109     Europe 1952
## 110     Europe 1957
## 111     Europe 1962
## 112     Europe 1967
## 113     Europe 1972
## 114     Europe 1977
## 115     Europe 1982
## 116     Europe 1987
## 117     Europe 1992
## 118     Europe 1997
## 119     Europe 2002
## 120     Europe 2007
## 121     Africa 1952
## 122     Africa 1957
## 123     Africa 1962
## 124     Africa 1967
## 125     Africa 1972
## 126     Africa 1977
## 127     Africa 1982
## 128     Africa 1987
## 129     Africa 1992
## 130     Africa 1997
## 131     Africa 2002
## 132     Africa 2007
## 133   Americas 1952
## 134   Americas 1957
## 135   Americas 1962
## 136   Americas 1967
## 137   Americas 1972
## 138   Americas 1977
## 139   Americas 1982
## 140   Americas 1987
## 141   Americas 1992
## 142   Americas 1997
## 143   Americas 2002
## 144   Americas 2007
## 145     Europe 1952
## 146     Europe 1957
## 147     Europe 1962
## 148     Europe 1967
## 149     Europe 1972
## 150     Europe 1977
## 151     Europe 1982
## 152     Europe 1987
## 153     Europe 1992
## 154     Europe 1997
## 155     Europe 2002
## 156     Europe 2007
## 157     Africa 1952
## 158     Africa 1957
## 159     Africa 1962
## 160     Africa 1967
## 161     Africa 1972
## 162     Africa 1977
## 163     Africa 1982
## 164     Africa 1987
## 165     Africa 1992
## 166     Africa 1997
## 167     Africa 2002
## 168     Africa 2007
## 169   Americas 1952
## 170   Americas 1957
## 171   Americas 1962
## 172   Americas 1967
## 173   Americas 1972
## 174   Americas 1977
## 175   Americas 1982
## 176   Americas 1987
## 177   Americas 1992
## 178   Americas 1997
## 179   Americas 2002
## 180   Americas 2007
## 181     Europe 1952
## 182     Europe 1957
## 183     Europe 1962
## 184     Europe 1967
## 185     Europe 1972
## 186     Europe 1977
## 187     Europe 1982
## 188     Europe 1987
## 189     Europe 1992
## 190     Europe 1997
## 191     Europe 2002
## 192     Europe 2007
## 193     Africa 1952
## 194     Africa 1957
## 195     Africa 1962
## 196     Africa 1967
## 197     Africa 1972
## 198     Africa 1977
## 199     Africa 1982
## 200     Africa 1987
## 201     Africa 1992
## 202     Africa 1997
## 203     Africa 2002
## 204     Africa 2007
## 205     Africa 1952
## 206     Africa 1957
## 207     Africa 1962
## 208     Africa 1967
## 209     Africa 1972
## 210     Africa 1977
## 211     Africa 1982
## 212     Africa 1987
## 213     Africa 1992
## 214     Africa 1997
## 215     Africa 2002
## 216     Africa 2007
## 217       Asia 1952
## 218       Asia 1957
## 219       Asia 1962
## 220       Asia 1967
## 221       Asia 1972
## 222       Asia 1977
## 223       Asia 1982
## 224       Asia 1987
## 225       Asia 1992
## 226       Asia 1997
## 227       Asia 2002
## 228       Asia 2007
## 229     Africa 1952
## 230     Africa 1957
## 231     Africa 1962
## 232     Africa 1967
## 233     Africa 1972
## 234     Africa 1977
## 235     Africa 1982
## 236     Africa 1987
## 237     Africa 1992
## 238     Africa 1997
## 239     Africa 2002
## 240     Africa 2007
## 241   Americas 1952
## 242   Americas 1957
## 243   Americas 1962
## 244   Americas 1967
## 245   Americas 1972
## 246   Americas 1977
## 247   Americas 1982
## 248   Americas 1987
## 249   Americas 1992
## 250   Americas 1997
## 251   Americas 2002
## 252   Americas 2007
## 253     Africa 1952
## 254     Africa 1957
## 255     Africa 1962
## 256     Africa 1967
## 257     Africa 1972
## 258     Africa 1977
## 259     Africa 1982
## 260     Africa 1987
## 261     Africa 1992
## 262     Africa 1997
## 263     Africa 2002
## 264     Africa 2007
## 265     Africa 1952
## 266     Africa 1957
## 267     Africa 1962
## 268     Africa 1967
## 269     Africa 1972
## 270     Africa 1977
## 271     Africa 1982
## 272     Africa 1987
## 273     Africa 1992
## 274     Africa 1997
## 275     Africa 2002
## 276     Africa 2007
## 277   Americas 1952
## 278   Americas 1957
## 279   Americas 1962
## 280   Americas 1967
## 281   Americas 1972
## 282   Americas 1977
## 283   Americas 1982
## 284   Americas 1987
## 285   Americas 1992
## 286   Americas 1997
## 287   Americas 2002
## 288   Americas 2007
## 289       Asia 1952
## 290       Asia 1957
## 291       Asia 1962
## 292       Asia 1967
## 293       Asia 1972
## 294       Asia 1977
## 295       Asia 1982
## 296       Asia 1987
## 297       Asia 1992
## 298       Asia 1997
## 299       Asia 2002
## 300       Asia 2007
## 301   Americas 1952
## 302   Americas 1957
## 303   Americas 1962
## 304   Americas 1967
## 305   Americas 1972
## 306   Americas 1977
## 307   Americas 1982
## 308   Americas 1987
## 309   Americas 1992
## 310   Americas 1997
## 311   Americas 2002
## 312   Americas 2007
## 313     Africa 1952
## 314     Africa 1957
## 315     Africa 1962
## 316     Africa 1967
## 317     Africa 1972
## 318     Africa 1977
## 319     Africa 1982
## 320     Africa 1987
## 321     Africa 1992
## 322     Africa 1997
## 323     Africa 2002
## 324     Africa 2007
## 325     Africa 1952
## 326     Africa 1957
## 327     Africa 1962
## 328     Africa 1967
## 329     Africa 1972
## 330     Africa 1977
## 331     Africa 1982
## 332     Africa 1987
## 333     Africa 1992
## 334     Africa 1997
## 335     Africa 2002
## 336     Africa 2007
## 337     Africa 1952
## 338     Africa 1957
## 339     Africa 1962
## 340     Africa 1967
## 341     Africa 1972
## 342     Africa 1977
## 343     Africa 1982
## 344     Africa 1987
## 345     Africa 1992
## 346     Africa 1997
## 347     Africa 2002
## 348     Africa 2007
## 349   Americas 1952
## 350   Americas 1957
## 351   Americas 1962
## 352   Americas 1967
## 353   Americas 1972
## 354   Americas 1977
## 355   Americas 1982
## 356   Americas 1987
## 357   Americas 1992
## 358   Americas 1997
## 359   Americas 2002
## 360   Americas 2007
## 361     Africa 1952
## 362     Africa 1957
## 363     Africa 1962
## 364     Africa 1967
## 365     Africa 1972
## 366     Africa 1977
## 367     Africa 1982
## 368     Africa 1987
## 369     Africa 1992
## 370     Africa 1997
## 371     Africa 2002
## 372     Africa 2007
## 373     Europe 1952
## 374     Europe 1957
## 375     Europe 1962
## 376     Europe 1967
## 377     Europe 1972
## 378     Europe 1977
## 379     Europe 1982
## 380     Europe 1987
## 381     Europe 1992
## 382     Europe 1997
## 383     Europe 2002
## 384     Europe 2007
## 385   Americas 1952
## 386   Americas 1957
## 387   Americas 1962
## 388   Americas 1967
## 389   Americas 1972
## 390   Americas 1977
## 391   Americas 1982
## 392   Americas 1987
## 393   Americas 1992
## 394   Americas 1997
## 395   Americas 2002
## 396   Americas 2007
## 397     Europe 1952
## 398     Europe 1957
## 399     Europe 1962
## 400     Europe 1967
## 401     Europe 1972
## 402     Europe 1977
## 403     Europe 1982
## 404     Europe 1987
## 405     Europe 1992
## 406     Europe 1997
## 407     Europe 2002
## 408     Europe 2007
## 409     Europe 1952
## 410     Europe 1957
## 411     Europe 1962
## 412     Europe 1967
## 413     Europe 1972
## 414     Europe 1977
## 415     Europe 1982
## 416     Europe 1987
## 417     Europe 1992
## 418     Europe 1997
## 419     Europe 2002
## 420     Europe 2007
## 421     Africa 1952
## 422     Africa 1957
## 423     Africa 1962
## 424     Africa 1967
## 425     Africa 1972
## 426     Africa 1977
## 427     Africa 1982
## 428     Africa 1987
## 429     Africa 1992
## 430     Africa 1997
## 431     Africa 2002
## 432     Africa 2007
## 433   Americas 1952
## 434   Americas 1957
## 435   Americas 1962
## 436   Americas 1967
## 437   Americas 1972
## 438   Americas 1977
## 439   Americas 1982
## 440   Americas 1987
## 441   Americas 1992
## 442   Americas 1997
## 443   Americas 2002
## 444   Americas 2007
## 445   Americas 1952
## 446   Americas 1957
## 447   Americas 1962
## 448   Americas 1967
## 449   Americas 1972
## 450   Americas 1977
## 451   Americas 1982
## 452   Americas 1987
## 453   Americas 1992
## 454   Americas 1997
## 455   Americas 2002
## 456   Americas 2007
## 457     Africa 1952
## 458     Africa 1957
## 459     Africa 1962
## 460     Africa 1967
## 461     Africa 1972
## 462     Africa 1977
## 463     Africa 1982
## 464     Africa 1987
## 465     Africa 1992
## 466     Africa 1997
## 467     Africa 2002
## 468     Africa 2007
## 469   Americas 1952
## 470   Americas 1957
## 471   Americas 1962
## 472   Americas 1967
## 473   Americas 1972
## 474   Americas 1977
## 475   Americas 1982
## 476   Americas 1987
## 477   Americas 1992
## 478   Americas 1997
## 479   Americas 2002
## 480   Americas 2007
## 481     Africa 1952
## 482     Africa 1957
## 483     Africa 1962
## 484     Africa 1967
## 485     Africa 1972
## 486     Africa 1977
## 487     Africa 1982
## 488     Africa 1987
## 489     Africa 1992
## 490     Africa 1997
## 491     Africa 2002
## 492     Africa 2007
## 493     Africa 1952
## 494     Africa 1957
## 495     Africa 1962
## 496     Africa 1967
## 497     Africa 1972
## 498     Africa 1977
## 499     Africa 1982
## 500     Africa 1987
## 501     Africa 1992
## 502     Africa 1997
## 503     Africa 2002
## 504     Africa 2007
## 505     Africa 1952
## 506     Africa 1957
## 507     Africa 1962
## 508     Africa 1967
## 509     Africa 1972
## 510     Africa 1977
## 511     Africa 1982
## 512     Africa 1987
## 513     Africa 1992
## 514     Africa 1997
## 515     Africa 2002
## 516     Africa 2007
## 517     Europe 1952
## 518     Europe 1957
## 519     Europe 1962
## 520     Europe 1967
## 521     Europe 1972
## 522     Europe 1977
## 523     Europe 1982
## 524     Europe 1987
## 525     Europe 1992
## 526     Europe 1997
## 527     Europe 2002
## 528     Europe 2007
## 529     Europe 1952
## 530     Europe 1957
## 531     Europe 1962
## 532     Europe 1967
## 533     Europe 1972
## 534     Europe 1977
## 535     Europe 1982
## 536     Europe 1987
## 537     Europe 1992
## 538     Europe 1997
## 539     Europe 2002
## 540     Europe 2007
## 541     Africa 1952
## 542     Africa 1957
## 543     Africa 1962
## 544     Africa 1967
## 545     Africa 1972
## 546     Africa 1977
## 547     Africa 1982
## 548     Africa 1987
## 549     Africa 1992
## 550     Africa 1997
## 551     Africa 2002
## 552     Africa 2007
## 553     Africa 1952
## 554     Africa 1957
## 555     Africa 1962
## 556     Africa 1967
## 557     Africa 1972
## 558     Africa 1977
## 559     Africa 1982
## 560     Africa 1987
## 561     Africa 1992
## 562     Africa 1997
## 563     Africa 2002
## 564     Africa 2007
## 565     Europe 1952
## 566     Europe 1957
## 567     Europe 1962
## 568     Europe 1967
## 569     Europe 1972
## 570     Europe 1977
## 571     Europe 1982
## 572     Europe 1987
## 573     Europe 1992
## 574     Europe 1997
## 575     Europe 2002
## 576     Europe 2007
## 577     Africa 1952
## 578     Africa 1957
## 579     Africa 1962
## 580     Africa 1967
## 581     Africa 1972
## 582     Africa 1977
## 583     Africa 1982
## 584     Africa 1987
## 585     Africa 1992
## 586     Africa 1997
## 587     Africa 2002
## 588     Africa 2007
## 589     Europe 1952
## 590     Europe 1957
## 591     Europe 1962
## 592     Europe 1967
## 593     Europe 1972
## 594     Europe 1977
## 595     Europe 1982
## 596     Europe 1987
## 597     Europe 1992
## 598     Europe 1997
## 599     Europe 2002
## 600     Europe 2007
## 601   Americas 1952
## 602   Americas 1957
## 603   Americas 1962
## 604   Americas 1967
## 605   Americas 1972
## 606   Americas 1977
## 607   Americas 1982
## 608   Americas 1987
## 609   Americas 1992
## 610   Americas 1997
## 611   Americas 2002
## 612   Americas 2007
## 613     Africa 1952
## 614     Africa 1957
## 615     Africa 1962
## 616     Africa 1967
## 617     Africa 1972
## 618     Africa 1977
## 619     Africa 1982
## 620     Africa 1987
## 621     Africa 1992
## 622     Africa 1997
## 623     Africa 2002
## 624     Africa 2007
## 625     Africa 1952
## 626     Africa 1957
## 627     Africa 1962
## 628     Africa 1967
## 629     Africa 1972
## 630     Africa 1977
## 631     Africa 1982
## 632     Africa 1987
## 633     Africa 1992
## 634     Africa 1997
## 635     Africa 2002
## 636     Africa 2007
## 637   Americas 1952
## 638   Americas 1957
## 639   Americas 1962
## 640   Americas 1967
## 641   Americas 1972
## 642   Americas 1977
## 643   Americas 1982
## 644   Americas 1987
## 645   Americas 1992
## 646   Americas 1997
## 647   Americas 2002
## 648   Americas 2007
## 649   Americas 1952
## 650   Americas 1957
## 651   Americas 1962
## 652   Americas 1967
## 653   Americas 1972
## 654   Americas 1977
## 655   Americas 1982
## 656   Americas 1987
## 657   Americas 1992
## 658   Americas 1997
## 659   Americas 2002
## 660   Americas 2007
## 661       Asia 1952
## 662       Asia 1957
## 663       Asia 1962
## 664       Asia 1967
## 665       Asia 1972
## 666       Asia 1977
## 667       Asia 1982
## 668       Asia 1987
## 669       Asia 1992
## 670       Asia 1997
## 671       Asia 2002
## 672       Asia 2007
## 673     Europe 1952
## 674     Europe 1957
## 675     Europe 1962
## 676     Europe 1967
## 677     Europe 1972
## 678     Europe 1977
## 679     Europe 1982
## 680     Europe 1987
## 681     Europe 1992
## 682     Europe 1997
## 683     Europe 2002
## 684     Europe 2007
## 685     Europe 1952
## 686     Europe 1957
## 687     Europe 1962
## 688     Europe 1967
## 689     Europe 1972
## 690     Europe 1977
## 691     Europe 1982
## 692     Europe 1987
## 693     Europe 1992
## 694     Europe 1997
## 695     Europe 2002
## 696     Europe 2007
## 697       Asia 1952
## 698       Asia 1957
## 699       Asia 1962
## 700       Asia 1967
## 701       Asia 1972
## 702       Asia 1977
## 703       Asia 1982
## 704       Asia 1987
## 705       Asia 1992
## 706       Asia 1997
## 707       Asia 2002
## 708       Asia 2007
## 709       Asia 1952
## 710       Asia 1957
## 711       Asia 1962
## 712       Asia 1967
## 713       Asia 1972
## 714       Asia 1977
## 715       Asia 1982
## 716       Asia 1987
## 717       Asia 1992
## 718       Asia 1997
## 719       Asia 2002
## 720       Asia 2007
## 721       Asia 1952
## 722       Asia 1957
## 723       Asia 1962
## 724       Asia 1967
## 725       Asia 1972
## 726       Asia 1977
## 727       Asia 1982
## 728       Asia 1987
## 729       Asia 1992
## 730       Asia 1997
## 731       Asia 2002
## 732       Asia 2007
## 733       Asia 1952
## 734       Asia 1957
## 735       Asia 1962
## 736       Asia 1967
## 737       Asia 1972
## 738       Asia 1977
## 739       Asia 1982
## 740       Asia 1987
## 741       Asia 1992
## 742       Asia 1997
## 743       Asia 2002
## 744       Asia 2007
## 745     Europe 1952
## 746     Europe 1957
## 747     Europe 1962
## 748     Europe 1967
## 749     Europe 1972
## 750     Europe 1977
## 751     Europe 1982
## 752     Europe 1987
## 753     Europe 1992
## 754     Europe 1997
## 755     Europe 2002
## 756     Europe 2007
## 757       Asia 1952
## 758       Asia 1957
## 759       Asia 1962
## 760       Asia 1967
## 761       Asia 1972
## 762       Asia 1977
## 763       Asia 1982
## 764       Asia 1987
## 765       Asia 1992
## 766       Asia 1997
## 767       Asia 2002
## 768       Asia 2007
## 769     Europe 1952
## 770     Europe 1957
## 771     Europe 1962
## 772     Europe 1967
## 773     Europe 1972
## 774     Europe 1977
## 775     Europe 1982
## 776     Europe 1987
## 777     Europe 1992
## 778     Europe 1997
## 779     Europe 2002
## 780     Europe 2007
## 781   Americas 1952
## 782   Americas 1957
## 783   Americas 1962
## 784   Americas 1967
## 785   Americas 1972
## 786   Americas 1977
## 787   Americas 1982
## 788   Americas 1987
## 789   Americas 1992
## 790   Americas 1997
## 791   Americas 2002
## 792   Americas 2007
## 793       Asia 1952
## 794       Asia 1957
## 795       Asia 1962
## 796       Asia 1967
## 797       Asia 1972
## 798       Asia 1977
## 799       Asia 1982
## 800       Asia 1987
## 801       Asia 1992
## 802       Asia 1997
## 803       Asia 2002
## 804       Asia 2007
## 805       Asia 1952
## 806       Asia 1957
## 807       Asia 1962
## 808       Asia 1967
## 809       Asia 1972
## 810       Asia 1977
## 811       Asia 1982
## 812       Asia 1987
## 813       Asia 1992
## 814       Asia 1997
## 815       Asia 2002
## 816       Asia 2007
## 817     Africa 1952
## 818     Africa 1957
## 819     Africa 1962
## 820     Africa 1967
## 821     Africa 1972
## 822     Africa 1977
## 823     Africa 1982
## 824     Africa 1987
## 825     Africa 1992
## 826     Africa 1997
## 827     Africa 2002
## 828     Africa 2007
## 829       Asia 1952
## 830       Asia 1957
## 831       Asia 1962
## 832       Asia 1967
## 833       Asia 1972
## 834       Asia 1977
## 835       Asia 1982
## 836       Asia 1987
## 837       Asia 1992
## 838       Asia 1997
## 839       Asia 2002
## 840       Asia 2007
## 841       Asia 1952
## 842       Asia 1957
## 843       Asia 1962
## 844       Asia 1967
## 845       Asia 1972
## 846       Asia 1977
## 847       Asia 1982
## 848       Asia 1987
## 849       Asia 1992
## 850       Asia 1997
## 851       Asia 2002
## 852       Asia 2007
## 853       Asia 1952
## 854       Asia 1957
## 855       Asia 1962
## 856       Asia 1967
## 857       Asia 1972
## 858       Asia 1977
## 859       Asia 1982
## 860       Asia 1987
## 861       Asia 1992
## 862       Asia 1997
## 863       Asia 2002
## 864       Asia 2007
## 865       Asia 1952
## 866       Asia 1957
## 867       Asia 1962
## 868       Asia 1967
## 869       Asia 1972
## 870       Asia 1977
## 871       Asia 1982
## 872       Asia 1987
## 873       Asia 1992
## 874       Asia 1997
## 875       Asia 2002
## 876       Asia 2007
## 877     Africa 1952
## 878     Africa 1957
## 879     Africa 1962
## 880     Africa 1967
## 881     Africa 1972
## 882     Africa 1977
## 883     Africa 1982
## 884     Africa 1987
## 885     Africa 1992
## 886     Africa 1997
## 887     Africa 2002
## 888     Africa 2007
## 889     Africa 1952
## 890     Africa 1957
## 891     Africa 1962
## 892     Africa 1967
## 893     Africa 1972
## 894     Africa 1977
## 895     Africa 1982
## 896     Africa 1987
## 897     Africa 1992
## 898     Africa 1997
## 899     Africa 2002
## 900     Africa 2007
## 901     Africa 1952
## 902     Africa 1957
## 903     Africa 1962
## 904     Africa 1967
## 905     Africa 1972
## 906     Africa 1977
## 907     Africa 1982
## 908     Africa 1987
## 909     Africa 1992
## 910     Africa 1997
## 911     Africa 2002
## 912     Africa 2007
## 913     Africa 1952
## 914     Africa 1957
## 915     Africa 1962
## 916     Africa 1967
## 917     Africa 1972
## 918     Africa 1977
## 919     Africa 1982
## 920     Africa 1987
## 921     Africa 1992
## 922     Africa 1997
## 923     Africa 2002
## 924     Africa 2007
## 925     Africa 1952
## 926     Africa 1957
## 927     Africa 1962
## 928     Africa 1967
## 929     Africa 1972
## 930     Africa 1977
## 931     Africa 1982
## 932     Africa 1987
## 933     Africa 1992
## 934     Africa 1997
## 935     Africa 2002
## 936     Africa 2007
## 937       Asia 1952
## 938       Asia 1957
## 939       Asia 1962
## 940       Asia 1967
## 941       Asia 1972
## 942       Asia 1977
## 943       Asia 1982
## 944       Asia 1987
## 945       Asia 1992
## 946       Asia 1997
## 947       Asia 2002
## 948       Asia 2007
## 949     Africa 1952
## 950     Africa 1957
## 951     Africa 1962
## 952     Africa 1967
## 953     Africa 1972
## 954     Africa 1977
## 955     Africa 1982
## 956     Africa 1987
## 957     Africa 1992
## 958     Africa 1997
## 959     Africa 2002
## 960     Africa 2007
## 961     Africa 1952
## 962     Africa 1957
## 963     Africa 1962
## 964     Africa 1967
## 965     Africa 1972
## 966     Africa 1977
## 967     Africa 1982
## 968     Africa 1987
## 969     Africa 1992
## 970     Africa 1997
## 971     Africa 2002
## 972     Africa 2007
## 973     Africa 1952
## 974     Africa 1957
## 975     Africa 1962
## 976     Africa 1967
## 977     Africa 1972
## 978     Africa 1977
## 979     Africa 1982
## 980     Africa 1987
## 981     Africa 1992
## 982     Africa 1997
## 983     Africa 2002
## 984     Africa 2007
## 985   Americas 1952
## 986   Americas 1957
## 987   Americas 1962
## 988   Americas 1967
## 989   Americas 1972
## 990   Americas 1977
## 991   Americas 1982
## 992   Americas 1987
## 993   Americas 1992
## 994   Americas 1997
## 995   Americas 2002
## 996   Americas 2007
## 997       Asia 1952
## 998       Asia 1957
## 999       Asia 1962
## 1000      Asia 1967
## 1001      Asia 1972
## 1002      Asia 1977
## 1003      Asia 1982
## 1004      Asia 1987
## 1005      Asia 1992
## 1006      Asia 1997
## 1007      Asia 2002
## 1008      Asia 2007
## 1009    Europe 1952
## 1010    Europe 1957
## 1011    Europe 1962
## 1012    Europe 1967
## 1013    Europe 1972
## 1014    Europe 1977
## 1015    Europe 1982
## 1016    Europe 1987
## 1017    Europe 1992
## 1018    Europe 1997
## 1019    Europe 2002
## 1020    Europe 2007
## 1021    Africa 1952
## 1022    Africa 1957
## 1023    Africa 1962
## 1024    Africa 1967
## 1025    Africa 1972
## 1026    Africa 1977
## 1027    Africa 1982
## 1028    Africa 1987
## 1029    Africa 1992
## 1030    Africa 1997
## 1031    Africa 2002
## 1032    Africa 2007
## 1033    Africa 1952
## 1034    Africa 1957
## 1035    Africa 1962
## 1036    Africa 1967
## 1037    Africa 1972
## 1038    Africa 1977
## 1039    Africa 1982
## 1040    Africa 1987
## 1041    Africa 1992
## 1042    Africa 1997
## 1043    Africa 2002
## 1044    Africa 2007
## 1045      Asia 1952
## 1046      Asia 1957
## 1047      Asia 1962
## 1048      Asia 1967
## 1049      Asia 1972
## 1050      Asia 1977
## 1051      Asia 1982
## 1052      Asia 1987
## 1053      Asia 1992
## 1054      Asia 1997
## 1055      Asia 2002
## 1056      Asia 2007
## 1057    Africa 1952
## 1058    Africa 1957
## 1059    Africa 1962
## 1060    Africa 1967
## 1061    Africa 1972
## 1062    Africa 1977
## 1063    Africa 1982
## 1064    Africa 1987
## 1065    Africa 1992
## 1066    Africa 1997
## 1067    Africa 2002
## 1068    Africa 2007
## 1069      Asia 1952
## 1070      Asia 1957
## 1071      Asia 1962
## 1072      Asia 1967
## 1073      Asia 1972
## 1074      Asia 1977
## 1075      Asia 1982
## 1076      Asia 1987
## 1077      Asia 1992
## 1078      Asia 1997
## 1079      Asia 2002
## 1080      Asia 2007
## 1081    Europe 1952
## 1082    Europe 1957
## 1083    Europe 1962
## 1084    Europe 1967
## 1085    Europe 1972
## 1086    Europe 1977
## 1087    Europe 1982
## 1088    Europe 1987
## 1089    Europe 1992
## 1090    Europe 1997
## 1091    Europe 2002
## 1092    Europe 2007
## 1093   Oceania 1952
## 1094   Oceania 1957
## 1095   Oceania 1962
## 1096   Oceania 1967
## 1097   Oceania 1972
## 1098   Oceania 1977
## 1099   Oceania 1982
## 1100   Oceania 1987
## 1101   Oceania 1992
## 1102   Oceania 1997
## 1103   Oceania 2002
## 1104   Oceania 2007
## 1105  Americas 1952
## 1106  Americas 1957
## 1107  Americas 1962
## 1108  Americas 1967
## 1109  Americas 1972
## 1110  Americas 1977
## 1111  Americas 1982
## 1112  Americas 1987
## 1113  Americas 1992
## 1114  Americas 1997
## 1115  Americas 2002
## 1116  Americas 2007
## 1117    Africa 1952
## 1118    Africa 1957
## 1119    Africa 1962
## 1120    Africa 1967
## 1121    Africa 1972
## 1122    Africa 1977
## 1123    Africa 1982
## 1124    Africa 1987
## 1125    Africa 1992
## 1126    Africa 1997
## 1127    Africa 2002
## 1128    Africa 2007
## 1129    Africa 1952
## 1130    Africa 1957
## 1131    Africa 1962
## 1132    Africa 1967
## 1133    Africa 1972
## 1134    Africa 1977
## 1135    Africa 1982
## 1136    Africa 1987
## 1137    Africa 1992
## 1138    Africa 1997
## 1139    Africa 2002
## 1140    Africa 2007
## 1141    Europe 1952
## 1142    Europe 1957
## 1143    Europe 1962
## 1144    Europe 1967
## 1145    Europe 1972
## 1146    Europe 1977
## 1147    Europe 1982
## 1148    Europe 1987
## 1149    Europe 1992
## 1150    Europe 1997
## 1151    Europe 2002
## 1152    Europe 2007
## 1153      Asia 1952
## 1154      Asia 1957
## 1155      Asia 1962
## 1156      Asia 1967
## 1157      Asia 1972
## 1158      Asia 1977
## 1159      Asia 1982
## 1160      Asia 1987
## 1161      Asia 1992
## 1162      Asia 1997
## 1163      Asia 2002
## 1164      Asia 2007
## 1165      Asia 1952
## 1166      Asia 1957
## 1167      Asia 1962
## 1168      Asia 1967
## 1169      Asia 1972
## 1170      Asia 1977
## 1171      Asia 1982
## 1172      Asia 1987
## 1173      Asia 1992
## 1174      Asia 1997
## 1175      Asia 2002
## 1176      Asia 2007
## 1177  Americas 1952
## 1178  Americas 1957
## 1179  Americas 1962
## 1180  Americas 1967
## 1181  Americas 1972
## 1182  Americas 1977
## 1183  Americas 1982
## 1184  Americas 1987
## 1185  Americas 1992
## 1186  Americas 1997
## 1187  Americas 2002
## 1188  Americas 2007
## 1189  Americas 1952
## 1190  Americas 1957
## 1191  Americas 1962
## 1192  Americas 1967
## 1193  Americas 1972
## 1194  Americas 1977
## 1195  Americas 1982
## 1196  Americas 1987
## 1197  Americas 1992
## 1198  Americas 1997
## 1199  Americas 2002
## 1200  Americas 2007
## 1201  Americas 1952
## 1202  Americas 1957
## 1203  Americas 1962
## 1204  Americas 1967
## 1205  Americas 1972
## 1206  Americas 1977
## 1207  Americas 1982
## 1208  Americas 1987
## 1209  Americas 1992
## 1210  Americas 1997
## 1211  Americas 2002
## 1212  Americas 2007
## 1213      Asia 1952
## 1214      Asia 1957
## 1215      Asia 1962
## 1216      Asia 1967
## 1217      Asia 1972
## 1218      Asia 1977
## 1219      Asia 1982
## 1220      Asia 1987
## 1221      Asia 1992
## 1222      Asia 1997
## 1223      Asia 2002
## 1224      Asia 2007
## 1225    Europe 1952
## 1226    Europe 1957
## 1227    Europe 1962
## 1228    Europe 1967
## 1229    Europe 1972
## 1230    Europe 1977
## 1231    Europe 1982
## 1232    Europe 1987
## 1233    Europe 1992
## 1234    Europe 1997
## 1235    Europe 2002
## 1236    Europe 2007
## 1237    Europe 1952
## 1238    Europe 1957
## 1239    Europe 1962
## 1240    Europe 1967
## 1241    Europe 1972
## 1242    Europe 1977
## 1243    Europe 1982
## 1244    Europe 1987
## 1245    Europe 1992
## 1246    Europe 1997
## 1247    Europe 2002
## 1248    Europe 2007
## 1249  Americas 1952
## 1250  Americas 1957
## 1251  Americas 1962
## 1252  Americas 1967
## 1253  Americas 1972
## 1254  Americas 1977
## 1255  Americas 1982
## 1256  Americas 1987
## 1257  Americas 1992
## 1258  Americas 1997
## 1259  Americas 2002
## 1260  Americas 2007
## 1261    Africa 1952
## 1262    Africa 1957
## 1263    Africa 1962
## 1264    Africa 1967
## 1265    Africa 1972
## 1266    Africa 1977
## 1267    Africa 1982
## 1268    Africa 1987
## 1269    Africa 1992
## 1270    Africa 1997
## 1271    Africa 2002
## 1272    Africa 2007
## 1273    Europe 1952
## 1274    Europe 1957
## 1275    Europe 1962
## 1276    Europe 1967
## 1277    Europe 1972
## 1278    Europe 1977
## 1279    Europe 1982
## 1280    Europe 1987
## 1281    Europe 1992
## 1282    Europe 1997
## 1283    Europe 2002
## 1284    Europe 2007
## 1285    Africa 1952
## 1286    Africa 1957
## 1287    Africa 1962
## 1288    Africa 1967
## 1289    Africa 1972
## 1290    Africa 1977
## 1291    Africa 1982
## 1292    Africa 1987
## 1293    Africa 1992
## 1294    Africa 1997
## 1295    Africa 2002
## 1296    Africa 2007
## 1297    Africa 1952
## 1298    Africa 1957
## 1299    Africa 1962
## 1300    Africa 1967
## 1301    Africa 1972
## 1302    Africa 1977
## 1303    Africa 1982
## 1304    Africa 1987
## 1305    Africa 1992
## 1306    Africa 1997
## 1307    Africa 2002
## 1308    Africa 2007
## 1309      Asia 1952
## 1310      Asia 1957
## 1311      Asia 1962
## 1312      Asia 1967
## 1313      Asia 1972
## 1314      Asia 1977
## 1315      Asia 1982
## 1316      Asia 1987
## 1317      Asia 1992
## 1318      Asia 1997
## 1319      Asia 2002
## 1320      Asia 2007
## 1321    Africa 1952
## 1322    Africa 1957
## 1323    Africa 1962
## 1324    Africa 1967
## 1325    Africa 1972
## 1326    Africa 1977
## 1327    Africa 1982
## 1328    Africa 1987
## 1329    Africa 1992
## 1330    Africa 1997
## 1331    Africa 2002
## 1332    Africa 2007
## 1333    Europe 1952
## 1334    Europe 1957
## 1335    Europe 1962
## 1336    Europe 1967
## 1337    Europe 1972
## 1338    Europe 1977
## 1339    Europe 1982
## 1340    Europe 1987
## 1341    Europe 1992
## 1342    Europe 1997
## 1343    Europe 2002
## 1344    Europe 2007
## 1345    Africa 1952
## 1346    Africa 1957
## 1347    Africa 1962
## 1348    Africa 1967
## 1349    Africa 1972
## 1350    Africa 1977
## 1351    Africa 1982
## 1352    Africa 1987
## 1353    Africa 1992
## 1354    Africa 1997
## 1355    Africa 2002
## 1356    Africa 2007
## 1357      Asia 1952
## 1358      Asia 1957
## 1359      Asia 1962
## 1360      Asia 1967
## 1361      Asia 1972
## 1362      Asia 1977
## 1363      Asia 1982
## 1364      Asia 1987
## 1365      Asia 1992
## 1366      Asia 1997
## 1367      Asia 2002
## 1368      Asia 2007
## 1369    Europe 1952
## 1370    Europe 1957
## 1371    Europe 1962
## 1372    Europe 1967
## 1373    Europe 1972
## 1374    Europe 1977
## 1375    Europe 1982
## 1376    Europe 1987
## 1377    Europe 1992
## 1378    Europe 1997
## 1379    Europe 2002
## 1380    Europe 2007
## 1381    Europe 1952
## 1382    Europe 1957
## 1383    Europe 1962
## 1384    Europe 1967
## 1385    Europe 1972
## 1386    Europe 1977
## 1387    Europe 1982
## 1388    Europe 1987
## 1389    Europe 1992
## 1390    Europe 1997
## 1391    Europe 2002
## 1392    Europe 2007
## 1393    Africa 1952
## 1394    Africa 1957
## 1395    Africa 1962
## 1396    Africa 1967
## 1397    Africa 1972
## 1398    Africa 1977
## 1399    Africa 1982
## 1400    Africa 1987
## 1401    Africa 1992
## 1402    Africa 1997
## 1403    Africa 2002
## 1404    Africa 2007
## 1405    Africa 1952
## 1406    Africa 1957
## 1407    Africa 1962
## 1408    Africa 1967
## 1409    Africa 1972
## 1410    Africa 1977
## 1411    Africa 1982
## 1412    Africa 1987
## 1413    Africa 1992
## 1414    Africa 1997
## 1415    Africa 2002
## 1416    Africa 2007
## 1417    Europe 1952
## 1418    Europe 1957
## 1419    Europe 1962
## 1420    Europe 1967
## 1421    Europe 1972
## 1422    Europe 1977
## 1423    Europe 1982
## 1424    Europe 1987
## 1425    Europe 1992
## 1426    Europe 1997
## 1427    Europe 2002
## 1428    Europe 2007
## 1429      Asia 1952
## 1430      Asia 1957
## 1431      Asia 1962
## 1432      Asia 1967
## 1433      Asia 1972
## 1434      Asia 1977
## 1435      Asia 1982
## 1436      Asia 1987
## 1437      Asia 1992
## 1438      Asia 1997
## 1439      Asia 2002
## 1440      Asia 2007
## 1441    Africa 1952
## 1442    Africa 1957
## 1443    Africa 1962
## 1444    Africa 1967
## 1445    Africa 1972
## 1446    Africa 1977
## 1447    Africa 1982
## 1448    Africa 1987
## 1449    Africa 1992
## 1450    Africa 1997
## 1451    Africa 2002
## 1452    Africa 2007
## 1453    Africa 1952
## 1454    Africa 1957
## 1455    Africa 1962
## 1456    Africa 1967
## 1457    Africa 1972
## 1458    Africa 1977
## 1459    Africa 1982
## 1460    Africa 1987
## 1461    Africa 1992
## 1462    Africa 1997
## 1463    Africa 2002
## 1464    Africa 2007
## 1465    Europe 1952
## 1466    Europe 1957
## 1467    Europe 1962
## 1468    Europe 1967
## 1469    Europe 1972
## 1470    Europe 1977
## 1471    Europe 1982
## 1472    Europe 1987
## 1473    Europe 1992
## 1474    Europe 1997
## 1475    Europe 2002
## 1476    Europe 2007
## 1477    Europe 1952
## 1478    Europe 1957
## 1479    Europe 1962
## 1480    Europe 1967
## 1481    Europe 1972
## 1482    Europe 1977
## 1483    Europe 1982
## 1484    Europe 1987
## 1485    Europe 1992
## 1486    Europe 1997
## 1487    Europe 2002
## 1488    Europe 2007
## 1489      Asia 1952
## 1490      Asia 1957
## 1491      Asia 1962
## 1492      Asia 1967
## 1493      Asia 1972
## 1494      Asia 1977
## 1495      Asia 1982
## 1496      Asia 1987
## 1497      Asia 1992
## 1498      Asia 1997
## 1499      Asia 2002
## 1500      Asia 2007
## 1501      Asia 1952
## 1502      Asia 1957
## 1503      Asia 1962
## 1504      Asia 1967
## 1505      Asia 1972
## 1506      Asia 1977
## 1507      Asia 1982
## 1508      Asia 1987
## 1509      Asia 1992
## 1510      Asia 1997
## 1511      Asia 2002
## 1512      Asia 2007
## 1513    Africa 1952
## 1514    Africa 1957
## 1515    Africa 1962
## 1516    Africa 1967
## 1517    Africa 1972
## 1518    Africa 1977
## 1519    Africa 1982
## 1520    Africa 1987
## 1521    Africa 1992
## 1522    Africa 1997
## 1523    Africa 2002
## 1524    Africa 2007
## 1525      Asia 1952
## 1526      Asia 1957
## 1527      Asia 1962
## 1528      Asia 1967
## 1529      Asia 1972
## 1530      Asia 1977
## 1531      Asia 1982
## 1532      Asia 1987
## 1533      Asia 1992
## 1534      Asia 1997
## 1535      Asia 2002
## 1536      Asia 2007
## 1537    Africa 1952
## 1538    Africa 1957
## 1539    Africa 1962
## 1540    Africa 1967
## 1541    Africa 1972
## 1542    Africa 1977
## 1543    Africa 1982
## 1544    Africa 1987
## 1545    Africa 1992
## 1546    Africa 1997
## 1547    Africa 2002
## 1548    Africa 2007
## 1549  Americas 1952
## 1550  Americas 1957
## 1551  Americas 1962
## 1552  Americas 1967
## 1553  Americas 1972
## 1554  Americas 1977
## 1555  Americas 1982
## 1556  Americas 1987
## 1557  Americas 1992
## 1558  Americas 1997
## 1559  Americas 2002
## 1560  Americas 2007
## 1561    Africa 1952
## 1562    Africa 1957
## 1563    Africa 1962
## 1564    Africa 1967
## 1565    Africa 1972
## 1566    Africa 1977
## 1567    Africa 1982
## 1568    Africa 1987
## 1569    Africa 1992
## 1570    Africa 1997
## 1571    Africa 2002
## 1572    Africa 2007
## 1573    Europe 1952
## 1574    Europe 1957
## 1575    Europe 1962
## 1576    Europe 1967
## 1577    Europe 1972
## 1578    Europe 1977
## 1579    Europe 1982
## 1580    Europe 1987
## 1581    Europe 1992
## 1582    Europe 1997
## 1583    Europe 2002
## 1584    Europe 2007
## 1585    Africa 1952
## 1586    Africa 1957
## 1587    Africa 1962
## 1588    Africa 1967
## 1589    Africa 1972
## 1590    Africa 1977
## 1591    Africa 1982
## 1592    Africa 1987
## 1593    Africa 1992
## 1594    Africa 1997
## 1595    Africa 2002
## 1596    Africa 2007
## 1597    Europe 1952
## 1598    Europe 1957
## 1599    Europe 1962
## 1600    Europe 1967
## 1601    Europe 1972
## 1602    Europe 1977
## 1603    Europe 1982
## 1604    Europe 1987
## 1605    Europe 1992
## 1606    Europe 1997
## 1607    Europe 2002
## 1608    Europe 2007
## 1609  Americas 1952
## 1610  Americas 1957
## 1611  Americas 1962
## 1612  Americas 1967
## 1613  Americas 1972
## 1614  Americas 1977
## 1615  Americas 1982
## 1616  Americas 1987
## 1617  Americas 1992
## 1618  Americas 1997
## 1619  Americas 2002
## 1620  Americas 2007
## 1621  Americas 1952
## 1622  Americas 1957
## 1623  Americas 1962
## 1624  Americas 1967
## 1625  Americas 1972
## 1626  Americas 1977
## 1627  Americas 1982
## 1628  Americas 1987
## 1629  Americas 1992
## 1630  Americas 1997
## 1631  Americas 2002
## 1632  Americas 2007
## 1633  Americas 1952
## 1634  Americas 1957
## 1635  Americas 1962
## 1636  Americas 1967
## 1637  Americas 1972
## 1638  Americas 1977
## 1639  Americas 1982
## 1640  Americas 1987
## 1641  Americas 1992
## 1642  Americas 1997
## 1643  Americas 2002
## 1644  Americas 2007
## 1645      Asia 1952
## 1646      Asia 1957
## 1647      Asia 1962
## 1648      Asia 1967
## 1649      Asia 1972
## 1650      Asia 1977
## 1651      Asia 1982
## 1652      Asia 1987
## 1653      Asia 1992
## 1654      Asia 1997
## 1655      Asia 2002
## 1656      Asia 2007
## 1657      Asia 1952
## 1658      Asia 1957
## 1659      Asia 1962
## 1660      Asia 1967
## 1661      Asia 1972
## 1662      Asia 1977
## 1663      Asia 1982
## 1664      Asia 1987
## 1665      Asia 1992
## 1666      Asia 1997
## 1667      Asia 2002
## 1668      Asia 2007
## 1669      Asia 1952
## 1670      Asia 1957
## 1671      Asia 1962
## 1672      Asia 1967
## 1673      Asia 1972
## 1674      Asia 1977
## 1675      Asia 1982
## 1676      Asia 1987
## 1677      Asia 1992
## 1678      Asia 1997
## 1679      Asia 2002
## 1680      Asia 2007
## 1681    Africa 1952
## 1682    Africa 1957
## 1683    Africa 1962
## 1684    Africa 1967
## 1685    Africa 1972
## 1686    Africa 1977
## 1687    Africa 1982
## 1688    Africa 1987
## 1689    Africa 1992
## 1690    Africa 1997
## 1691    Africa 2002
## 1692    Africa 2007
## 1693    Africa 1952
## 1694    Africa 1957
## 1695    Africa 1962
## 1696    Africa 1967
## 1697    Africa 1972
## 1698    Africa 1977
## 1699    Africa 1982
## 1700    Africa 1987
## 1701    Africa 1992
## 1702    Africa 1997
## 1703    Africa 2002
## 1704    Africa 2007
gap[ , continent]
## Error in `[.data.frame`(gap, , continent): object 'continent' not found

Because continent and year are not variables our current environment! dplyr does some fancy stuff behind the scenes to save us from typing the quotes.

This is fine if you have a data analysis workflow but if you want to write a function that, for example, selects an arbitrary set of columns, you'll run into trouble.

## here's a helper function that computes the mean of a variable, stratifying by a grouping variable
grouped_mean <- function(data, group_var, summary_var) {
  data %>%
    group_by(group_var) %>%
    summarise(mean = mean(summary_var))
}
gap %>% grouped_mean(continent, lifeExp)
## Error in grouped_df_impl(data, unname(vars), drop): Column `group_var` is unknown
gap %>% grouped_mean('continent', 'lifeExp')
## Error in grouped_df_impl(data, unname(vars), drop): Column `group_var` is unknown

See the rlang or seplyr packages for how one can deal with this problem in this context of using functions.

Tidying Data

Even before we conduct analysis or calculations, we need to put our data into the correct format. The goal here is to rearrange a messy dataset into one that is tidy

The two most important properties of tidy data are:

  1. Each column is a variable.
  2. Each row is an observation.

Tidy data is easier to work with, because you have a consistent way of referring to variables (as column names) and observations (as row indices). It then becomes easy to manipulate, visualize, and model.

For more on the concept of tidy data, read Hadley Wickham's paper here

Tidying Data/Wide vs. Long Formats

"Tidy datasets are all alike but every messy dataset is messy in its own way." – Hadley Wickham

Tabular datasets can be arranged in many ways. For instance, consider the data below. Each data set displays information on heart rate observed in individuals across 3 different time periods. But the data are organized differently in each table.

wide <- data.frame(
  name = c("Wilbur", "Petunia", "Gregory"),
  time1 = c(67, 80, 64),
  time2 = c(56, 90, 50),
  time3 = c(70, 67, 101)
)
wide
##      name time1 time2 time3
## 1  Wilbur    67    56    70
## 2 Petunia    80    90    67
## 3 Gregory    64    50   101
long <- data.frame(
  name = c("Wilbur", "Petunia", "Gregory", "Wilbur", "Petunia", "Gregory", "Wilbur", "Petunia", "Gregory"),
  time = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  heartrate = c(67, 80, 64, 56, 90, 50, 70, 67, 10)
)
long
##      name time heartrate
## 1  Wilbur    1        67
## 2 Petunia    1        80
## 3 Gregory    1        64
## 4  Wilbur    2        56
## 5 Petunia    2        90
## 6 Gregory    2        50
## 7  Wilbur    3        70
## 8 Petunia    3        67
## 9 Gregory    3        10

Question: Which one of these do you think is the tidy format?

Answer: The first dataframe (the "wide" one) would not be considered tidy because values (i.e., heartrate) are spread across multiple columns.

We often refer to these different structurs as "long" vs. "wide" formats. In the "long" format, you usually have 1 column for the observed variable and the other columns are ID variables.

For the "wide" format each row is often a site/subject/patient and you have multiple observation variables containing the same type of data. These can be either repeated observations over time, or observation of multiple variables (or a mix of both). In the above case, we had the same kind of data (heart rate) entered across 3 different columns, corresponding to three different time periods.

You may find data input may be simpler and some programs/functions may prefer the "wide" format. However, many of R’s functions have been designed assuming you have "long" format data.

Tidying the Gapminder Data

Lets look at the structure of our original gapminder dataframe:

head(gap)
##       country year      pop continent lifeExp gdpPercap
## 1 Afghanistan 1952  8425333      Asia  28.801  779.4453
## 2 Afghanistan 1957  9240934      Asia  30.332  820.8530
## 3 Afghanistan 1962 10267083      Asia  31.997  853.1007
## 4 Afghanistan 1967 11537966      Asia  34.020  836.1971
## 5 Afghanistan 1972 13079460      Asia  36.088  739.9811
## 6 Afghanistan 1977 14880372      Asia  38.438  786.1134

Question: Is this data frame wide or long?

Answer: This data frame is somewhere in between the purely 'long' and 'wide' formats. We have 3 "ID variables" (continent, country, year) and 3 "Observation variables" (pop, lifeExp, gdpPercap).

Despite not having ALL observations in 1 column, this intermediate format makes sense given that all 3 observation variables have different units. As we have seen, many of the functions in R are often vector based, and you usually do not want to do mathematical operations on values with different units.

On the other hand, there are some instances in which a purely long or wide format is ideal (e.g. plotting). Likewise, sometimes you'll get data on your desk that is poorly organized, and you'll need to reshape it.

tidyr

Thankfully, the tidyr package will help you efficiently transform your data regardless of original format.

# Install the "tidyr" package (only necessary one time)
# install.packages("tidyr") # Not Run

# Load the "tidyr" package (necessary every new R session)
library(tidyr)

tidyr::gather

Until now, we've been using the nicely formatted original gapminder data set. This data set is not quite wide and not quite long -- it's something in the middle, but "real" data (i.e., our own research data) will never be so well organized. Here let's start with the wide format version of the gapminder data set.

gap_wide <- read.csv("../data/gapminder_wide.csv", stringsAsFactors = FALSE)
head(gap_wide)
##   continent      country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962
## 1    Africa      Algeria      2449.0082      3013.9760      2550.8169
## 2    Africa       Angola      3520.6103      3827.9405      4269.2767
## 3    Africa        Benin      1062.7522       959.6011       949.4991
## 4    Africa     Botswana       851.2411       918.2325       983.6540
## 5    Africa Burkina Faso       543.2552       617.1835       722.5120
## 6    Africa      Burundi       339.2965       379.5646       355.2032
##   gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982
## 1      3246.9918      4182.6638      4910.4168      5745.1602
## 2      5522.7764      5473.2880      3008.6474      2756.9537
## 3      1035.8314      1085.7969      1029.1613      1277.8976
## 4      1214.7093      2263.6111      3214.8578      4551.1421
## 5       794.8266       854.7360       743.3870       807.1986
## 6       412.9775       464.0995       556.1033       559.6032
##   gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002
## 1      5681.3585      5023.2166      4797.2951      5288.0404
## 2      2430.2083      2627.8457      2277.1409      2773.2873
## 3      1225.8560      1191.2077      1232.9753      1372.8779
## 4      6205.8839      7954.1116      8647.1423     11003.6051
## 5       912.0631       931.7528       946.2950      1037.6452
## 6       621.8188       631.6999       463.1151       446.4035
##   gdpPercap_2007 lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967
## 1      6223.3675       43.077       45.685       48.303       51.407
## 2      4797.2313       30.015       31.999       34.000       35.985
## 3      1441.2849       38.223       40.358       42.618       44.885
## 4     12569.8518       47.622       49.618       51.520       53.298
## 5      1217.0330       31.975       34.906       37.814       40.697
## 6       430.0707       39.031       40.533       42.045       43.548
##   lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987 lifeExp_1992
## 1       54.518       58.014       61.368       65.799       67.744
## 2       37.928       39.483       39.942       39.906       40.647
## 3       47.014       49.190       50.904       52.337       53.919
## 4       56.024       59.319       61.484       63.622       62.745
## 5       43.591       46.137       48.122       49.557       50.260
## 6       44.057       45.910       47.471       48.211       44.736
##   lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962
## 1       69.152       70.994       72.301  9279525 10270856 11000948
## 2       40.963       41.003       42.731  4232095  4561361  4826015
## 3       54.777       54.406       56.728  1738315  1925173  2151895
## 4       52.556       46.634       50.728   442308   474639   512764
## 5       50.324       50.650       52.295  4469979  4713416  4919632
## 6       45.326       47.360       49.580  2445618  2667518  2961915
##   pop_1967 pop_1972 pop_1977 pop_1982 pop_1987 pop_1992 pop_1997 pop_2002
## 1 12760499 14760787 17152804 20033753 23254956 26298373 29072015 31287142
## 2  5247469  5894858  6162675  7016384  7874230  8735988  9875024 10866106
## 3  2427334  2761407  3168267  3641603  4243788  4981671  6066080  7026113
## 4   553541   619351   781472   970347  1151184  1342614  1536536  1630347
## 5  5127935  5433886  5889574  6634596  7586551  8878303 10352843 12251209
## 6  3330989  3529983  3834415  4580410  5126023  5809236  6121610  7021078
##   pop_2007
## 1 33333216
## 2 12420476
## 3  8078314
## 4  1639131
## 5 14326203
## 6  8390505

The first step towards getting our nice intermediate data format is to first convert from the wide to the long format. The function gather() will 'gather' the observation variables into a single variable. This is sometimes called "melting" your data, because it melts the table from wide to long. Those data will be melted into two variables: one for the variable names, and the other for the variable values.

gap_long <- gap_wide %>%
    gather(obstype_year, obs_values, 3:38)
head(gap_long)
##   continent      country   obstype_year obs_values
## 1    Africa      Algeria gdpPercap_1952  2449.0082
## 2    Africa       Angola gdpPercap_1952  3520.6103
## 3    Africa        Benin gdpPercap_1952  1062.7522
## 4    Africa     Botswana gdpPercap_1952   851.2411
## 5    Africa Burkina Faso gdpPercap_1952   543.2552
## 6    Africa      Burundi gdpPercap_1952   339.2965

Notice that we put 3 arguments into the gather() function:

  1. the name the new column for the new ID variable (obstype_year),
  2. the name for the new amalgamated observation variable (obs_value),
  3. the indices of the old variables (3:38, signalling columns 3 through 38) that we want to gather into one variable. Notice that we don't want to melt down columns 1 and 2, as these are considered "ID" variables.

tidyr::select

If there are a lot of columns or they're named in a consistent pattern, we might not want to select them using the column numbers. It'd be easier to use some information contained in the names themselves. We can select variables using:

See the select() function in dplyr for more options.

For instance, here we do the same gather operation with (1) the starts_with function, and (2) the - operator:

# with the starts_with() function
gap_long <- gap_wide %>%
    gather(obstype_year, obs_values, starts_with('pop'),
           starts_with('lifeExp'), starts_with('gdpPercap'))
head(gap_long)
##   continent      country obstype_year obs_values
## 1    Africa      Algeria     pop_1952    9279525
## 2    Africa       Angola     pop_1952    4232095
## 3    Africa        Benin     pop_1952    1738315
## 4    Africa     Botswana     pop_1952     442308
## 5    Africa Burkina Faso     pop_1952    4469979
## 6    Africa      Burundi     pop_1952    2445618
# with the - operator
gap_long <- gap_wide %>%
  gather(obstype_year, obs_values, -continent, -country)
head(gap_long)
##   continent      country   obstype_year obs_values
## 1    Africa      Algeria gdpPercap_1952  2449.0082
## 2    Africa       Angola gdpPercap_1952  3520.6103
## 3    Africa        Benin gdpPercap_1952  1062.7522
## 4    Africa     Botswana gdpPercap_1952   851.2411
## 5    Africa Burkina Faso gdpPercap_1952   543.2552
## 6    Africa      Burundi gdpPercap_1952   339.2965

However you choose to do it, notice that the output collapses all of the measure variables into two columns: one containing new ID variable, the other containing the observation value for that row.

tidyr::separate

You'll notice that in our long dataset, obstype_year actually contains 2 pieces of information, the observation type (pop, lifeExp, or gdpPercap) and the year.

We can use the separate() function to split the character strings into multiple variables:

gap_long_sep <- gap_long %>%
  separate(obstype_year, into = c('obs_type','year'), sep = "_") %>%
  mutate(year = as.integer(year))
head(gap_long_sep)
##   continent      country  obs_type year obs_values
## 1    Africa      Algeria gdpPercap 1952  2449.0082
## 2    Africa       Angola gdpPercap 1952  3520.6103
## 3    Africa        Benin gdpPercap 1952  1062.7522
## 4    Africa     Botswana gdpPercap 1952   851.2411
## 5    Africa Burkina Faso gdpPercap 1952   543.2552
## 6    Africa      Burundi gdpPercap 1952   339.2965

If you didn't use tidyr to do this, you'd have to use the strsplit function and use multiple lines of code to replace the column in gap_long with two new columns. This solution is much cleaner.

tidyr::spread

The opposite of gather() is spread(). It spreads our observation variables back out to make a wider table. We can use this function to spread our gap_long() to the original "medium" format.

gap_medium <- gap_long_sep %>%
  spread(obs_type, obs_values)
head(gap_medium)
##   continent country year gdpPercap lifeExp      pop
## 1    Africa Algeria 1952  2449.008  43.077  9279525
## 2    Africa Algeria 1957  3013.976  45.685 10270856
## 3    Africa Algeria 1962  2550.817  48.303 11000948
## 4    Africa Algeria 1967  3246.992  51.407 12760499
## 5    Africa Algeria 1972  4182.664  54.518 14760787
## 6    Africa Algeria 1977  4910.417  58.014 17152804

All we need is some quick fixes to make this dataset identical to the original gap dataset:

gap <- read.csv("../data/gapminder-FiveYearData.csv")
head(gap_medium)
##   continent country year gdpPercap lifeExp      pop
## 1    Africa Algeria 1952  2449.008  43.077  9279525
## 2    Africa Algeria 1957  3013.976  45.685 10270856
## 3    Africa Algeria 1962  2550.817  48.303 11000948
## 4    Africa Algeria 1967  3246.992  51.407 12760499
## 5    Africa Algeria 1972  4182.664  54.518 14760787
## 6    Africa Algeria 1977  4910.417  58.014 17152804
head(gap)
##       country year      pop continent lifeExp gdpPercap
## 1 Afghanistan 1952  8425333      Asia  28.801  779.4453
## 2 Afghanistan 1957  9240934      Asia  30.332  820.8530
## 3 Afghanistan 1962 10267083      Asia  31.997  853.1007
## 4 Afghanistan 1967 11537966      Asia  34.020  836.1971
## 5 Afghanistan 1972 13079460      Asia  36.088  739.9811
## 6 Afghanistan 1977 14880372      Asia  38.438  786.1134
# rearrange columns
gap_medium <- gap_medium[,names(gap)]
head(gap_medium)
##   country year      pop continent lifeExp gdpPercap
## 1 Algeria 1952  9279525    Africa  43.077  2449.008
## 2 Algeria 1957 10270856    Africa  45.685  3013.976
## 3 Algeria 1962 11000948    Africa  48.303  2550.817
## 4 Algeria 1967 12760499    Africa  51.407  3246.992
## 5 Algeria 1972 14760787    Africa  54.518  4182.664
## 6 Algeria 1977 17152804    Africa  58.014  4910.417
# arrange by country, continent, and year
gap_medium <- gap_medium %>%
  arrange(country,continent,year)
head(gap_medium)
##       country year      pop continent lifeExp gdpPercap
## 1 Afghanistan 1952  8425333      Asia  28.801  779.4453
## 2 Afghanistan 1957  9240934      Asia  30.332  820.8530
## 3 Afghanistan 1962 10267083      Asia  31.997  853.1007
## 4 Afghanistan 1967 11537966      Asia  34.020  836.1971
## 5 Afghanistan 1972 13079460      Asia  36.088  739.9811
## 6 Afghanistan 1977 14880372      Asia  38.438  786.1134

What we just told you will become obsolete...

gather and spread are being replaced by pivot_longer and pivot_wider, which use ideas from the cdata package to make reshaping easier to think about.

Extra Resources

dplyr and tidyr have many more functions to help you wrangle and manipulate your data. See the Data Wrangling Cheat Sheet for more.

There are some other useful packages in the tidyverse:

Breakout

dplyr

  1. Use dplyr to create a data frame containing the median lifeExp for each continent

  2. Use dplyr to add a column to the gapminder dataset that contains the total population of the continent of each observation in a given year. For example, if the first observation is Afghanistan in 1952, the new column would contain the population of Asia in 1952.

  3. Use dplyr to add a column called gdpPercap_diff that contains the difference between the observation's gdpPercap and the mean gdpPercap of the continent in that year. Arrange the dataframe by the column you just created, in descending order (so that the relatively richest country/years are listed first)

tidyr

  1. Subset the results from question #3 to select only the country, year, and gdpPercap_diff columns. Use tidyr put it in wide format so that countries are rows and years are columns.

Hint: you'll probably see a message about a missing grouping variable. If you don't want continent included, you can pass the output of problem 3 through ungroup() to get rid of the continent information.